Improving Phrase-Based Machine Translation
نویسندگان
چکیده
Current state-of-the-art machine translation systems use a phrase-based scoring model for choosing among candidate translations in a target language, typically English. These models are deemed phrase-based because candidate sentence scores are in large part a product of phrase translation probabilities. These translation probabilities must be learned in some unsupervised manner from a pair of sentence-aligned corpora. With the end goal of improving upon the published results of such systems, our project proceeded in two stages. First, we attempted to duplicate the performance results of existing end-to-end translation systems by piecing together available components and engineering the remainder guided by published techniques. Second, we identified two significant shortcomings of published systems and attempted to remedy them via machine learning techniques. In particular, we chose to learn phrase translation probabilities directly rather than deriving them heuristically. We also augmented the scoring model to relax a troublesome independence assumption across phrases.
منابع مشابه
Improving Statistical Machine Translation with Monolingual Collocation
This paper proposes to use monolingual collocations to improve Statistical Machine Translation (SMT). We make use of the collocation probabilities, which are estimated from monolingual corpora, in two aspects, namely improving word alignment for various kinds of SMT systems and improving phrase table for phrase-based SMT. The experimental results show that our method improves the performance of...
متن کاملA Comparative Study of English-Persian Translation of Neural Google Translation
Many studies abroad have focused on neural machine translation and almost all concluded that this method was much closer to humanistic translation than machine translation. Therefore, this paper aimed at investigating whether neural machine translation was more acceptable in English-Persian translation in comparison with machine translation. Hence, two types of text were chosen to be translated...
متن کاملExample-Based Paraphrasing for Improved Phrase-Based Statistical Machine Translation
In this article, an original view on how to improve phrase translation estimates is proposed. This proposal is grounded on two main ideas: first, that appropriate examples of a given phrase should participate more in building its translation distribution; second, that paraphrases can be used to better estimate this distribution. Initial experiments provide evidence of the potential of our appro...
متن کاملImproving Phrase-Based Statistical Translation by Modifying Phrase Extraction and Including Several Features
Nowadays, most of the statistical translation systems are based on phrases (i.e. groups of words). In this paper we study different improvements to the standard phrase-based translation system. We describe a modified method for the phrase extraction which deals with larger phrases while keeping a reasonable number of phrases. We also propose additional features which lead to a clear improvement...
متن کاملImproving Neural Machine Translation through Phrase-based Forced Decoding
Compared to traditional statistical machine translation (SMT), neural machine translation (NMT) often sacrifices adequacy for the sake of fluency. We propose a method to combine the advantages of traditional SMT and NMT by exploiting an existing phrase-based SMT model to compute the phrase-based decoding cost for an NMT output and then using this cost to rerank the n-best NMT outputs. The main ...
متن کامل